The Momentum Problem in MDL and Bayesian Prediction
نویسندگان
چکیده
Preface " Prediction is very difficult, especially about the future. " The Minimum Description Length (MDL) principle provides a powerful philosophy for learning from observations of the past [Grünwald et al., 2005; Ris-sanen, 1989]. It equates learning with compressing the observational data. As is common in science, there may be multiple contending explanations, or models, for the data. In this thesis we investigate an application of the MDL principle to prediction of the future when there are at least two such models. We will show that the regular, commonly used form of MDL can behave suboptimally and present a refinement of regular MDL that we call the Switch-Point procedure. Being based on data compression, the Switch-Point procedure may still be considered an application of the MDL principle, although it differs from the way in which MDL is usually applied. For the convenience of readers with a background in Bayesian statistics, we give an interpretation of the regular MDL procedure as an instance of Bayesian Model Averaging (BMA). As a consequence our results on MDL transfer to BMA directly. Our first contribution is to identify the momentum phenomenon, which arises when one model enables the most accurate predictions of the future given few observations of the past, but predictions based on another model become more accurate when more data are collected. Essentially, this may happen whenever the models themselves represent compound explanations. i ii Preface The momentum phenomenon will not occur, for example, if one model, M 0 , represents the conjecture that the data come from repeated tosses of a biased coin with probability 3/5 of coming up heads, and the other model, M 1 , describes the data as tosses of a coin with probability 4/7 of coming up heads. It can occur, however, if M 1 were to represent the hypothesis that the data come from a coin with unknown probability p of coming up heads. This latter model basically combines all the specific explanations " the probability of coming up heads is 4/7 " into the compound explanation " the probability of coming up heads may be any fixed value p ". The momentum phenomenon can occur, in that case, if the relative frequency of heads in the data converges to some number f , which is close to, but not equal to 3/5. If this happens, then for few observations of the past the slightly incorrect, …
منابع مشابه
On the use of back propagation and radial basis function neural networks in surface roughness prediction
Various artificial neural networks types are examined and compared for the prediction of surface roughness in manufacturing technology. The aim of the study is to evaluate different kinds of neural networks and observe their performance and applicability on the same problem. More specifically, feed-forward artificial neural networks are trained with three different back propagation algorithms, ...
متن کاملTechnical Report IDSIA - 13 - 05 Asymptotics of Discrete MDL for Online Prediction ∗ Jan Poland and Marcus
Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e. observations come in one by one, and the predictor is allowed to update his state o...
متن کاملAsymptotics of Discrete MDL for Online Prediction ∗ Jan Poland and Marcus
Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e. observations come in one by one, and the predictor is allowed to update his state o...
متن کاملPaper Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties
SUMMARY This paper addresses the problem of learning Bayesian belief networks (BBN) based on the minimum description length (MDL) principle. First, we give a formula of description length based on which the MDL-based procedure learns a BBN. Secondly, we point out that the diierence between the MDL-based and Cooper and Herskovits procedures is essentially in the priors rather than in the approac...
متن کاملA Disease Outbreak Prediction Model Using Bayesian Inference: A Case of Influenza
Introduction: One major problem in analyzing epidemic data is the lack of data and high dependency among the available data, which is due to the fact that the epidemic process is not directly observable. Methods: One method for epidemic data analysis to estimate the desired epidemic parameters, such as disease transmission rate and recovery rate, is data ...
متن کامل